An Accurate Grid -based PAM Clustering Method for Large Dataset
نویسندگان
چکیده
Clustering is the procedure to group similar objects together. Several algorithms have been proposed for clustering. Among them, the K-means clustering method has less time complexity. But it is sensitive to extreme values and would cause less accurate clustering of the dataset. However, K-medoids method does not have such limitations. But this method uses user-defined value for K. Therefore, if the number of clusters is not chosen correctly, it will not provide the natural number of clusters and hence the accuracy will be minimized. In this paper, we propose a grid based clustering method that has higher accuracy than the existing K-medoids algorithm. Our proposed Grid Multi-dimensional K-medoids (GMK) algorithm uses the concept of cluster validity index and it is shown from the experimental results that the new proposed method has higher accuracy than the existing K-medoids method. The object space is quantized into a number of cells, and the distance between the intra cluster objects decrease which contributes to the higher accuracy of the proposed method. Therefore, the proposed approach has higher accuracy and provides natural clustering method which scales well for large dataset.
منابع مشابه
Optimizing the Grade Classification Model of Mineralized Zones Using a Learning Method Based on Harmony Search Algorithm
The classification of mineralized areas into different groups based on mineral grade and prospectivity is a practical problem in the area of optimal risk, time, and cost management of exploration projects. The purpose of this paper was to present a new approach for optimizing the grade classification model of an orebody. That is to say, through hybridizing machine learning with a metaheuristic ...
متن کاملAn improved opposition-based Crow Search Algorithm for Data Clustering
Data clustering is an ideal way of working with a huge amount of data and looking for a structure in the dataset. In other words, clustering is the classification of the same data; the similarity among the data in a cluster is maximum and the similarity among the data in the different clusters is minimal. The innovation of this paper is a clustering method based on the Crow Search Algorithm (CS...
متن کاملPrediction of slope stability using adaptive neuro-fuzzy inference system based on clustering methods
Slope stability analysis is an enduring research topic in the engineering and academic sectors. Accurate prediction of the factor of safety (FOS) of slopes, their stability, and their performance is not an easy task. In this work, the adaptive neuro-fuzzy inference system (ANFIS) was utilized to build an estimation model for the prediction of FOS. Three ANFIS models were implemented including g...
متن کاملخوشهبندی دادهها بر پایه شناسایی کلید
Clustering has been one of the main building blocks in the fields of machine learning and computer vision. Given a pair-wise distance measure, it is challenging to find a proper way to identify a subset of representative exemplars and its associated cluster structures. Recent trend on big data analysis poses a more demanding requirement on new clustering algorithm to be both scalable and accura...
متن کاملClustering of nasopharyngeal carcinoma intensity modulated radiation therapy plans based on k-means algorithm and geometrical features
Background: The design of intensity modulated radiation therapy (IMRT) plans is difficult and time-consuming. The retrieval of similar IMRT plans from the IMRT plan dataset can effectively improve the quality and efficiency of IMRT plans and automate the design of IMRT planning. However, the large IMRT plans datasets will bring inefficient retrieval result. Materials and Methods: An intensity-m...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012